Order scheduling optimization in manufacturing enterprises based on MDP and dynamic programming

In the era of Industry 4.0, order scheduling is a crucial link in the production of manufacturing enterprises. In view of order scheduling in manufacturing enterprises, a finite horizon Markov decision process model is proposed in this work based on two sets of equipment and three types of orders with different production lead times to maximize the revenue in manufacturing production systems. Then, the dynamic programming model is incorporated into the optimal order scheduling strategy. Python is employed to simulate the order scheduling in manufacturing enterprises. Based on survey data, the superiority of the proposed model compared to traditional first come, first served order scheduling is verified by experimental cases. Finally, sensitivity analysis is conducted on the longest service hours of the devices and the order completion rate to explore the applicability of the proposed order scheduling strategy.

In the era of Industry 4.0, manufacturing enterprises with limited production capacity must make reasonable order scheduling and scientific production arrangements to meet growing customer demands. In this way, they can fully utilize the existing production capacity and ensure mass production and timely delivery, effectively shortening the order lead time, reducing order delay, and maximizing customer demand satisfaction [1][2][3] . Unscientific order scheduling substantially slows the development of enterprises. On the one hand, an excessive demand for orders cannot be met with the conventional production capacity of a production line, so enterprises must address this problem through other means. Meanwhile, the overtime production of enterprises is prone to excessive equipment load, shortening the working life of machines and affecting the normal operation of the enterprise. Subcontracting orders increases costs and reduces profits and is not conducive to the long-term development of the enterprise. If enterprises improve their production capacity by increasing or upgrading equipment, the demand for orders can be satisfied, but this approach increases the fixed-variable cost and the business risk of the enterprises. In the postpandemic age of 2022, it is even more important for enterprises to respond flexibly and efficiently to rapidly changing needs and new challenges. An increasing number of enterprises believe that blindly increasing production capacity will result in a fatal blow to development. On the other hand, insufficient order scheduling will lower the current risks of the enterprise, but it may cause a waste of production resources, which will result in reduced profits, lost customers, and impairment of long-term development. Therefore, formulating a set of order scheduling strategies that comprehensively considers the economic benefits and long-term development of the enterprise and solves the production capacity fluctuation caused by the "imbalance of busy and idle time" of the production line is important for current manufacturing enterprises 4 .
Generally, the orders of manufacturing enterprises are based on various customer needs and can usually be classified as follows: standard orders, which have a standard bill of material with a fixed lead time and stock; nonstandard orders, which refer to customized products without BOMs; and emergency orders, which are short delivery requests. Different types of orders reflect various demand characteristics of customers. Standard orders and nonstandard orders give the manufacturing enterprises specific lead times, while emergency orders come randomly on the day of production scheduling. Emergency orders are urgent, regardless of cost. Usually, the customer will bear the component transportation costs and emergency service costs, or the enterprise must accept a large delay penalty due to delayed delivery time. Therefore, an emergency order should be arranged for production immediately when it is accepted; otherwise, it will cause a higher rejection cost. The customer required delivery (CRD) for standard and nonstandard orders is relatively relaxed, allowing for reasonable waiting during the specified lead time. In addition, different types of orders require different production times in practice [5][6][7][8] .
There are two major tasks for the effective management of production scheduling in manufacturing enterprises. One is to create JIT (just-in-time) production scheduling to respond to customer demand, that is, the

Literature review
Order scheduling of manufacturing enterprises is an extensive research topic worldwide. Practical production and manufacturing are affected by various factors, such as the arrival of emergency orders, order cancellation, and raw material interruption. Thus, static production and production scheduling face great difficulties in practice; therefore, dynamic production scheduling is increasingly implemented. Nie et al. 13 studied dynamic single-machine order production scheduling with workpiece release time and proposed a production scheduling rule constructor based on gene expression programming to construct effective production scheduling rules. Targeting high efficiency and stability, Fat et al. 14 studied the dynamic production scheduling of flexible operation workshops, proposed a metaheuristic algorithm based on the genetic algorithm, and achieved effective computational experimental results. Vinod et al. 15 simulated the interaction between the delivery time distribution method and production scheduling rules in the typical production system of a dynamic operation workshop in various application scenarios. Pickardt et al. 16 proposed a two-stage hyperheuristic algorithm by combining the genetic planning algorithm with the evolutionary algorithm to generate a set of production scheduling rules to solve complex dynamic production scheduling in an operation workshop in the semiconductor manufacturing industry. Hamzadayi et al. 17 proposed fully reactive order scheduling based on the simulated annealing algorithm and allocation rules and optimized the dynamic production scheduling of multiple equivalent parallel machines controlled by a single server using an event-driven production scheduling strategy. Rajabinasab et al. 18 investigated the dynamic production scheduling of a class of flexible operation workshops and considered dynamic events (such as the random arrival of workpieces, uncertain processing time, accidental machine failure, and path and process flexibility). Then, they developed a multiagent production scheduling system and obtained an efficient and robust production scheduling strategy. Zhou et al. 19 proposed a task scheduling method in a dynamic cloud manufacturing environment with random arrival tasks. Based on MDP theory, Wang et al. 20 optimized a resource scheduling strategy in the case of dynamic change. Through mathematical modeling and software solutions, they identified the best scheduling strategy under specific conditions, ensuring timeliness. On the basis of traditional production scheduling, Qian et al. 21 proposed a more complete model for the order acceptance of MTO (make-to-order) enterprises. Meanwhile, they considered the inventory cost of orders and various customer priority factors based on the delayed delivery cost, rejection cost, production cost, and other traditional model elements. Finally, they modeled the optimal production scheduling decision as an MDP. Related Scheduling Strategy and Algorithm is as below Tables 1 and 2. At present, there are many different research directions and solution methods for order scheduling and production scheduling in manufacturing enterprises, laying a foundation for further research. The current literature on production line optimization order scheduling fails to fully consider the characteristics of different types of customer orders because the model establishment and solution become complex with the increase in consideration factors. There are still a few studies on multiple production line equipment, and the different production times for various types of orders are rarely included. However, differences in the order production time affect the optimal scheduling of the production capacity. This paper considers two production lines with three types of orders and their different production times. Based on reasonable hypotheses, an MDP in the finite time domain is established, and the optimal production scheduling strategy for orders is obtained using Python. Then, the applicability of the proposed production scheduling strategy is checked through sensitivity analysis.

Model description
This paper studies the optimal production scheduling strategy for two production lines and three types of production orders requiring different production times and revenues. Production scheduling for manufacturing enterprises aims to maximize the benefits in terms of production equipment. When an order arrives, the PRP system immediately decides whether to accept it according to the decision rules and offers a reasonable production schedule in accordance with the set production scheduling rules.

Model assumption.
Assumption (H1) Production line equipment A and equipment B, which are no different in terms of service condition and production capacity, are allowed to produce an order simultaneously.

Assumption (H2)
The production capacity of the production equipment is measured by the time slot, which can be set as the base of time according to the actual situation, and the longest service hours per day for equipment A and equipment B are the same.
Assumption (H3) The production times required by different types of orders are different and can be represented by the number of time slots. The inspection time of nonstandard orders is longer than the production time for standard orders (usually due to more processes).

Assumption (H4)
The entire production cycle can be divided into a limited number of equal moments. At any moment, the production scheduling requests of standard orders and nonstandard orders are entered into the system with different probabilities, which are independent of each other. Emergency orders are arranged randomly on a given day of the production line.

Assumption (H5)
At any moment, only one order production request is entered into equipment at any production time, that is, there are at most two production orders at any production time.

Assumption (H6)
The order arrangements of equipment A and equipment B are independent of each other. At any production time, there are only three possibilities for each piece of equipment, namely, a production request for a standard order, a nonstandard order, or a random emergency order.
Assumption (H7) At any moment, the production scheduling rules of the system are set as follows. If the system accepts a production request for only one order (a standard order or a nonstandard order), the equipment with a larger surplus capacity is preferred. If the surplus capacities of both equipment are equal, equipment A is selected. If the system accepts a production request for a standard order and a nonstandard order, it will give priority to the equipment with more surplus capacity for nonstandard order production. If the system accepts production requests for two orders of the same type (two standard orders or two nonstandard orders), they are assigned to equipment A and equipment B separately. www.nature.com/scientificreports/ Based on the above assumptions, scheduling of the production line can be described using an MDP in the finite time domain, and maximizing the enterprise revenue can be transformed into obtaining the optimal solution of dynamic programming. The MDP refers to the decision maker periodically or continuously observing a random dynamic system of a Markov process and making decisions sequentially. In the MDP, the system model depends on only the current state and selected strategy rather than the historical state and strategy. The MDP model can make the optimal decision satisfying the conditions at any production time based on the system state and the order arrangement, as shown in Fig. 1. The MDP can be described by a quintuple: where T, S, and A are the set of decisions, system states, and available actions, respectively; P is the state transition probability, P(s′|s, a) P(s′|s, a) indicates the probability that taking action "a" from state "s" reaching state "s"; R is the reward function, and R(s,a) represents the immediate reward for executing action a at state S.
Model parameter settings. In this paper, the maximum number of available time slots for equipment A and equipment B each day is assumed to be one; then, the total number of available time slots for the production line is one per day. The whole production cycle can be divided into many equal production moments, and all moments in the production cycle are expressed as t = T, t − 1… 2, 1. The production request of an order arrives during one of these moments with a certain probability. t = T and t = 0 are the beginning and end of the production cycle, respectively. The other parameter settings are listed in Table 3.
Model establishment. According to the five elements of the MDP (decision stage, state set, decision set, transfer probability, and reward), the MDP model in the production cycle is established, and the objective function is solved.
Decision stage T. It refers to any production time T in the production cycle, that is, T = T, t − 1… 2, 1.   The probability that a standard order reaches the production system λ2 The probability that a nonstandard order reaches the production system Q The number of emergency orders that randomly arrived on a production day π 1 The income obtained from a standard order π 2 The income obtained from a nonstandard order π 3 The income obtained from an emergency order  www.nature.com/scientificreports/ Decision set A. At each production moment, t = T, T−1…, 2, 1, the system makes a decision based on the current number of remaining time slots and the arrangement of the order, that is, to accept or reject a production request. The action set of the system is represented by At (TA,TB), where 0 and 1 represent the rejection and acceptance of the production request, respectively. Then, the following relationship can be obtained: Transfer probability P. There are three possibilities (a standard order, a nonstandard order, and no production task) for the order arrangement of equipment at any production moment. Then, the production system may have six states at any time, which correspond to six different transfer probabilities (Table 4).
Reward R. The benefits of an enterprise in producing a standard order, a nonstandard order, and an emergency order are π 1 , π 2 , and π 3 , respectively. Accordingly, the costs of rejecting a standard order, a nonstandard order, and an emergency order are expressed as C 1 , C 2 , and C 3 , respectively. Generally, π 1 + C 1 < π 2 + C 2 + π 3 + C 3 ; that is, the comprehensive income of an emergency order is the highest and that of a standard order is the lowest.
In other words, emergency orders should be given the highest priority, and nonstandard orders are generally of higher priority than standard orders.
Object function V. Vt (TA,TB) represents the maximum revenue from time t, the current state of the system S = (TA,TB), to the end of the production cycle.
represents the maximum income that the enterprise can obtain from the current state of the system S = (TA,TB) at time t (after the arrival of Class i) to the end of the production cycle, where i = 1, 2, 3, 4, 5 and 6, correspond to six different transition probabilities. According to dynamic programming theory, the optimal income of the following system can be established at any time t = T, t − 1… 1.
where i = 1 when the production requests of two standard orders are given; i = 2 when the production requests for two nonstandard orders are given; i = 3 when the production requests of a standard order and a nonstandard order are given simultaneously; At(TA,TB) = 0 RejectOrder 1 AcceptOrder www.nature.com/scientificreports/ i = 4 when a production request for a standard order is offered; i = 5 when a production request for a standard order is offered; and i = 6 when there is no production request.
When the production cycle ends and the production line starts on the same day, that is, T = 0, emergency orders may arrive, and the number of random emergency orders is Q. The production time of an emergency order is g time slots, and the system state is S = (T A ,T B ). When [T A /g] + [T B /g] > Q ([a] means the largest integer not greater than a), that is, the number of emergency orders that can be produced in the total remaining time slots is greater than the number of emergency orders actually arrived, the enterprise will incur an idle cost. Similarly, when [TA/g] + [TB/g] < Q, the enterprise will incur a corresponding rejection cost. The following expression can be obtained when TS = [TA/g] + [TB/g]: In Eq. (8), (a^b) represents min(a,b), and (a) + indicates max(a,0).

Model solution.
According to the MDP model, At(T A ,T B ) + is the set of actions that maximize the system revenue function. According to the restriction of marginal condition Eq. (8), the existence of an optimal solution is ensured at all production times t = T, T − 1…. After reasonable parameter setting, the value iteration strategy can be employed to solve the MDP model so that the optimal decision can be made at every moment according to the number of remaining time slots and the order arrangement. This involves accepting or rejecting the production request of an order and obtaining the specific production scheduling strategy according to the set production scheduling rules.
At any moment during production, there are three conditions (a standard order, a nonstandard order, and no production request) for the order arrangements for equipment A and equipment B. Therefore, the system may exhibit six possible conditions at any moment during production ( Table 2). At t = T, T − 1… when the system state is S = (T A ,T B ), the system compares the sizes of V t−1 (T A + T B − e) + π1 and V t−1 (T A + T B ) − c 1 to determine whether the production request of a standard order should be accepted. If π 1 + c 1 ≥ V t−1 (T A + T B ) − V t−1 (T A + T B − e), the order is accepted; otherwise, it is rejected. If a nonstandard order arrives, V t−1 (T A + T B − f) + π1 is compared with V t−1 (T A + T B ) − c 2 , and the order will be accepted if π 2 + c 2 ≥ V t−1 (T A + T B ) − V t−1 (T A + T B − f). According to this decision rule, the system can automatically make the optimal decision when any order arrives through multiple iterations. After deciding whether to accept an order, the production scheduling strategy is generated according to the current order situation and the set production scheduling rules.

Numerical examples
The parameters of the MDP model are set to simulate the arrangement of various orders and the optimal decisionmaking process of the system. The production cycle is evenly divided into 48 equal moments, namely, T = 48. At this time, the arrival time of the order and the decision moment of the system are both an any production moment t, where t = 48, 47, …, 1. The maximum numbers of available time slots for equipment A and B are both set to T = 96. The numbers of time slots for a standard order, a nonstandard order, and an emergency order are e = 4, f = 4, and g = 1, respectively. The corresponding benefits and rejection costs of the three types of orders are set to π 1 = 200, π 2 = 400, π 3 = 600, c 1 = 100, c 2 = 300, and c 3 = 500. In addition, the idle cost per time slot is defined as c 4 = 100. The production request of a standard order can be given randomly at any time with a probability of λ 1 = 0.7 and that of a nonstandard order is λ 2 = 0.2. On the production day, the random arrival of emergency order Q follows a Poisson distribution with 20 parameters. Based on the production request arrival rate of various orders, the transition probability λ i (where i = 1, 2, 3, 4, 5 and 6) can be calculated. The corresponding arrival situation and six transition probabilities are shown in Table 5.
The MDP can be simulated by Python based on the above parameter settings and dynamic programming theory. With the production request arrival rate of standard and nonstandard orders (λ 1 and λ 2 ), the arrival situation of a group in the production cycle can be simulated, and the optimal production scheduling strategy can be obtained through multilevel iteration. The example in this paper reveals that the system will generate critical values for accepting all types of orders. For a standard order, there is a critical value X at any production moment, which is the minimum value of π 1 + c 1 ≥ V t−1 (T A + T B ) − V t−1 (T A + T B − e), so the decision can be simplified as comparing T A + T B and X. When T A + T B ≥ X, π 1 + c 1 ≥ V t−1 (T A + T B ) − V t−1 (T A + T B − e), which means that the order is accepted; otherwise, it is rejected. Nonstandard orders also have a critical value. After accepting or rejecting an order, the system will arrange production based on the actual acceptance situation. If the production request of only one order is accepted, the equipment with more surplus capacity (equipment A is selected when T A = T B ) is selected for production. If the production request of a standard order or a nonstandard order is accepted, the equipment with more surplus capacity is preferentially selected for production of the nonstandard order (equipment A is selected to complete the nonstandard order if T A = T B ). If two orders of the same type are accepted, they will be assigned to equipment A and equipment B separately. The simulation results indicate that the expected revenue function Vt (T A = T B ) changes with the production time and the total number of remaining time slots, as demonstrated in Fig. 2. When production requests of two orders are entered into the system at Table 5. Arrival and transfer probabilities of orders.

Order arrival status Transfer probability
Two standard order requests arrive P 1 = 2 1 = 0.49 Two nonstandard order requests arrive P 2 = 2 2 = 0.04 A standard order & a nonstandard order request arrive P 3 = 2λ 1 λ 2 = 0.28 Only a standard order request arrives P 4 = 2λ 1 (1 − λ 1 − λ 2 ) = 0.14 Only a nonstandard order request arrives P 5 = 2λ 2 (1 − λ 1 − λ 2 ) = 0.04 No production request arrives P 6 = (1 − λ 1 − λ 2 ) 2 = 0.11 Figure 2. Expected revenue changes chart. www.nature.com/scientificreports/ every production moment, the changes in critical value of the system accepting the first and the second orders (a standard order or a nonstandard order) at any time are illustrated in Figs. 3 and 4, respectively. According to the decision rules of the MDP model and the production scheduling rules, the optimal production scheduling strategy within the production cycle is shown in Table 6. In contrast, the decision-making process of the traditional production strategy is shown in Table 7. Regarding order type, 1, 2, and 3 represent a standard order production request, a nonstandard order production request, and no production request, respectively. However, in the decision-making process, 1 means the order is accepted, and 0 indicates that it is rejected. T A and T B represent the numbers of remaining available time slots for equipment A and equipment B at the current moment, respectively.
In the decision-making process of the MDP, the numbers of remaining available time slots of the two equipment types change with the production time (Figs. 5 and 6), which corresponds to the production scheduling strategy on the production day. Since the equipment with more surplus capacity is preferentially selected for order production at any production moment, the change in the number of remaining time slots is similar for equipment A and B. In this way, this approach balances the utilization of the production capacity and conforms to the actual equipment situation in simultaneous production on the day of production.
Based on the production strategy in this paper, the maximum total revenue of the system in the production cycle is 19,600, while that of the traditional FCFS is 17,000. Therefore, the scheduling strategy adopted in this paper exhibits better scheduling ability.  parameters remain unchanged, the maximum number of available time slots T per day for the two equipment types is changed, and the total revenue of the enterprise changes with the total capacity of the system (Fig. 7). Therefore, the following conclusions can be drawn. (1) When the total production capacities of the two pieces of equipment fail to satisfy the production requests of all orders, the total revenue increases with the number of available time slots. (2) The smaller the number of available time slots is, the lower the capacity of the production system, and the more obvious the advantage of the MDP production scheduling strategy over the traditional production scheduling strategy. This result is better illustrated by the changes in the total revenue growth rate of the enterprise with capacity (Fig. 8). It is assumed in the figure that the MDP production strategy yields total income Y 1 , while the traditional production strategy yields total income Y 2 . Meanwhile, β indicates the total income growth rate of the enterprise.
Change in the production request arrival rate of nonstandard orders λ 2 . When the other conditions remain unchanged and the production request arrival rate of nonstandard orders is λ 2 , the changes in total revenue of the enterprise with λ 2 can be demonstrated, as shown in Fig. 9. If the total production capacity of the   www.nature.com/scientificreports/    www.nature.com/scientificreports/ system is insufficient for all the required production requests of orders, the following conclusions can be drawn.
(1) As the production request arrival rate of nonstandard orders increases, the total revenue of the enterprise changes little under the MDP-based strategy but presents a decreasing trend under the traditional production scheduling strategy. This is because in the case of a shortage of production capacity, enterprises will reject more orders with an increase in the production request arrival rate of nonstandard orders, increasing the rejection cost. Under the MDP-based production scheduling strategy, enterprises can complete more nonstandard orders with higher comprehensive income, achieving a balance of total income. (2) As the production request arrival rate of nonstandard orders increases, the revenue growth rate of the MDP-based strategy shows an upward trend when compared with that of the traditional production scheduling strategy (Fig. 10). Since the comprehensive income of nonstandard orders is higher than that of standard orders (i.e., π 2 + C 2 ≥ π 1 + C 1 ), the MDP-based production scheduling strategy adjusts the critical value of order acceptance according to the change in the production request arrival rate of nonstandard orders to retain the capacity for later arriving nonstandard orders.
On the other hand, more production requests for nonstandard orders will result in the rejection of more later arriving nonstandard orders under the FCFS, increasing the rejection cost.
In general, if the other variables remain unchanged, when the maximum number of available time slots (T) for each piece of equipment changes, the indicators under the two production scheduling strategies change with the total production capacity, as shown in Table 8. In the table, the value at T = 80 is used as the initial benchmark, and the change value of each indicator reflects the change in the next state based on the previous state. That is, T = 90 corresponds to the changes at T = 80, and T = 100 corresponds to the changes at T = 90 until T = 130. Under the condition of other variables remaining unchanged, the production arrival rate λ 2 of nonstandard orders changes when the index changes, as displayed in Fig. 7. λ 2 = 0.1 is defined as the initial base, and the analogy is the same as that given in Table 8, A change in the next state based on the previous state is also observed. Based on Tables 8 and 9, it can be concluded that the lower the system capacity and the higher the arrival rate of nonstandard orders are, the more prominent the superiority of the MDP-based production scheduling strategy over the traditional production scheduling strategy.

Conclusions
This paper proposes an MDP-based strategy to dynamically process the production scheduling of orders in manufacturing enterprises and to optimize the objectives to maximize the benefits of production equipment. With two types of production equipment and three types of orders with different production times, the optimal production scheduling strategy of the system is analyzed via dynamic programming theory. After reasonable  www.nature.com/scientificreports/ parameter setting, multistage iteration is performed with Python. The simulation results prove that compared with the traditional FCFS, the MDP-based production scheduling strategy adopted in this paper exhibits several advantages, which can maximize the benefits in terms of the production system of manufacturing enterprises. Then, based on the results of the sensitivity analysis, the maximum numbers of available time slots and the arrival rate λ 2 of nonstandard orders of the two pieces of equipment are changed for comparative analysis. The results indicate that the MDP-based strategy is suitable for the production scheduling of manufacturing enterprises. The MDP model is superior to the traditional production decision-making model when the system capacity is insufficient and the arrival rate of nonstandard orders is high. Follow-up research can be conducted based on consideration of the difference between the service capacity and efficiency of different equipment and analysis of other parameters that influence the enterprise revenue rate. In addition, factors such as inventory thresholds that affect the on-time delivery rate of orders can be considered in the model.

Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request.